GUARDIAN: Safeguarding LLM Multi-Agent Collaborations with Temporal Graph Modeling
A Model-Agnostic Framework for Detecting Anomalies and Errors in Multi-Agent Language Model Systems
Authors: J. Zhou et al.
Published on Arxiv: 2025-05-25
Link: http://arxiv.org/abs/2505.19234v1
Institutions: King’s College London • Beijing Institute of Technology • Tsinghua University
Keywords: large language models, multi-agent systems, hallucination detection, error propagation, graph anomaly detection, temporal attributed graph, encoder-decoder architecture, information bottleneck, unsupervised learning, AI safety
The rise of large language models (LLMs) has enabled the development of intelligent agents capable of complex, multi-turn dialogues, but collaboration between multiple LLM agents presents distinct safety challenges—including hallucination amplification and error propagation—that threaten reliability and security. Existing defense approaches struggle to address the dynamic nature of error propagation in such multi-agent systems, or require architectural changes, which limits their applicability.
To address these challenges, the article presents a comprehensive solution known as GUARDIAN, detailing its methodology and main contributions:
- GUARDIAN introduces a unified defense framework for multiple intertwined safety threats in LLM multi-agent collaborations.
- The system models collaborative interactions as discrete-time temporal attributed graphs, where nodes are agent states and edges represent inter-agent communications.
- An unsupervised encoder-decoder architecture is proposed, enabling anomaly detection on both the node (agent) and edge (communication) level through attribute and structural reconstruction.
- A novel graph abstraction approach draws from Information Bottleneck Theory, compressing temporal interaction graphs while preserving patterns relevant to anomaly detection.
- GUARDIAN supports incremental training, allowing the system to adapt dynamically to evolving agent behaviors and anomalies without altering the underlying LLMs.
- The framework demonstrates flexibility and extensibility, with evaluations conducted across diverse benchmarks: MMLU, MATH, FEVER, and Biographies datasets.
The article then reports experimental findings that demonstrate the effectiveness and robustness of the GUARDIAN framework:
- GUARDIAN achieves state-of-the-art safety performance, improving absolute accuracy by 4–8% over competitive baselines in challenging scenarios (hallucination amplification, agent-targeted and communication-targeted attacks).
- Experiments reveal high anomaly detection rates (up to 94.74%) and substantial resilience to error propagation, all while maintaining low API resource consumption.
- The framework proves robust to scaling, effectively handling networks of 3–7 agents and multiple LLM instances (GPT, Claude, Llama3).
- Ablation studies underline the necessity of balancing attribute and structural reconstruction, alongside proper information bottleneck configuration, to achieve optimal detection accuracy.
In summary, the article concludes with several key takeaways that highlight the practical implications and future prospects of GUARDIAN:
- GUARDIAN offers an effective and efficient solution to safeguard multi-agent LLM collaborations from hazards such as hallucination amplification and error propagation.
- Temporal graph modeling combined with unsupervised anomaly detection enables a transparent, model-agnostic, and scalable safety assurance framework.
- The approach opens future research avenues, including broader application of graph-based anomaly detection to collaborative AI systems and deeper exploration of information flow constraints from a theoretical perspective.